Rational and Methodology

Reasoning

Virginia Public Schools enroll over 1.2 million students and include around 2000 schools with 6th highest attainment rate of post secondary education in the country. The school system need to be properly preparing their students for the future especially those from historically marginalized demographic groups which in recent years have significantly increased post secondary attainment rates. The goal of this visualization is to compare fail rates of different demographic groups in order to look at preparedness for the future and potentially further education opportunities.

Methodology

In order to compare educational preparedness I utilized fail rates on Standards of Learning (SOLs), Virginia’s state mandated tests collected by the Virginia Department of Education.I decided to put an emphasis on county level averages rather than individual school or state level in an attempt to minimize the influence of outlier’s and due to website downloading issues. The data includes subsets for various demographic group such a gender, race, socioeconomic status, disability status, and home status.

Layout Decision

I decided to encompass the final visualization into a Dashboard in order to be able to view each graph separately instead of having 7 on the same page. The first tab depicts the averages for all counties in a map so the viewer can easily see the increase of scores over time. This is paired with the arrow graph in order to see the positive improvement for all demographic groups. All the other tabs allow for the user to focus in on a certain comparisons with the histogram allowing for an easy way to view the distribution of score averages of counties. This is potentially a draft for a interactive website to explore data about Virginia Public Schools.

Visualizations and Associated Code

Pack Loading and warning supressing from knited HTML

knitr::opts_chunk$set(warning = FALSE, message = FALSE)
pacman::p_load(here, tidyverse, plotly, dplyr, sf, ggthemes, tidygeocoder, magick, stringr, flex_dashboard) #Necessary Packages 

General Data Cleaning

us_counties <- read_sf(here( "data", "data", "cb_2021_us_county_500k","cb_2021_us_county_500k.shp")) #County SF

va_counties <- us_counties %>% 
  filter(STATE_NAME == "Virginia") #Sub set to only include Va counties

school <- read_csv(here("Datasets", "Assessments.csv")) #Read in assessment data from VDOE
school_2 <- school %>% 
  filter(Division == "Williamsburg-James City County Public Schools" | Division == "Greensville County Public Schools" | Division == "Alleghany Highlands Public Schools" | Division =="Fairfax County Public Schools") %>%
  mutate(Division = case_when(
  Division == "Williamsburg-James City County Public Schools"  ~ "James City County",
   Division == "Alleghany Highlands Public Schools"  ~ "Covington city",
    Division == "Fairfax County Public Schools" ~ "Fairfax city",
  Division == "Greensville County Public Schools" ~ "Emporia city", #Change name of counties to match SF
  TRUE ~ Division# Keeps all other names unchanged
  )) 

school_3<-school %>%
      bind_rows(school_2) #Recombine the updated shape files 
  
cl_sch <- school_3 %>% 
  filter(Subgroup =="All Students" & Subject=="Mathematics") %>% #Subset to only include all student averages and math assessments 
  mutate(Division = str_remove(Division, "Public Schools$")) %>% #Drop Public school from the end of county names
  mutate(
    Division = str_replace_all(Division, regex("city", ignore_case = TRUE), "city")
  )%>% #find all instances of city ignoring casing and change to "city"
  mutate(Division = trimws(Division, which = "right")) %>% #remove weird spacing
  mutate(Division = case_when(
    Division == "Charles city County"  ~ "Charles City County", #Fix Capitalization on weird cases
    Division == "Williamsburg-James city County"  ~ "Williamsburg city",
    Division == "Alleghany Highlands"  ~ "Alleghany County",
    Division == "James city County"  ~ "James City County", 
    TRUE ~ Division       # Keeps all other names unchanged
  )) %>%
  mutate(Fail = as.numeric(Fail)) %>% #Turn fail from a character value to a numeric 
  rename(NAMELSAD = Division) #Rename division column to NAMELSAD in order to match SF

va_school <- left_join(va_counties, # dataset for county maps
                  cl_sch, #fail rate information
                  by = "NAMELSAD")

Percent of Students Failed Mathematics Assesments by County in Virginia

p <- ggplot(data = va_school, na.rm=TRUE) +
  geom_sf(aes(fill = Fail, text= paste("County Name:", NAMELSAD, "\n Percentage of Students who Failed:", Fail, "%" ))) + 
  ggthemes::theme_map() +
  scale_fill_continuous(low = "green", high = "red", 
                      name = "Percentage Failed", guide="none") + #low fail rates are classified as green/high as red
  labs(title = "Red Indicates High Fail Rates while Green Repersents Lower Fail Rates.")+
   facet_wrap(~Year, nrow=3) #3 maps of Virginia in a line 
ggplotly(p, tooltip="text") #add interactive elements

Arrow Graph

arrow <- school_3 %>% #Change in demographic type 
  filter((Year =="2020 - 2021"|Year== "2022 - 2023")) %>%#Only include first and last available years 
  filter(Subgroup== "Students with Disabilities"|Subgroup== "All Students"|Subgroup== "Black"|Subgroup== "Economically Disadvantaged" |Subgroup== "Female" |Subgroup== "Foster Care" |Subgroup== "Hispanic" |Subgroup== "Homeless"|Subgroup== "Migrant" |Subgroup== "Male" | Subgroup== "Not Economically Disadvantage" | Subgroup== "White" | Subgroup== "Students with Disabilities" | Subgroup== "Students without Disabilities")%>%
  mutate(Fail = as.numeric(Fail)) %>% #change from character to numeric variable 
  group_by(Subgroup, Year) %>% #groups answers by demographic and year
  summarize(Mean_Fail = mean(Fail, na.rm = TRUE)) #averages all counties fail rates demographics and year

Percentage of Students Who Failed by Demogrpahic Group

p <- ggplot(arrow, aes(x=Mean_Fail, y=reorder(Subgroup, Mean_Fail))) #Create arrow chart and order subgroup by percent fail
p+
  geom_point(size=5, aes(colour=Year)) + #Specify start and end points
  geom_line(arrow = arrow(length=unit(0.30,"cm"), ends="first", type = "closed"))+ #Specific arrow type
  scale_color_manual(values = c("2020 - 2021" = "red", "2022 - 2023" = "green")) + 
  theme_bw()+
  theme(legend.position = "top", axis.text.y = element_text(angle = 40))+ #Slant x axis for cleaner view
    labs(x = "Percent of Students Failed",
         y = "Demographic Groups")

Gender Comparison

type_sch <- school_3 %>% 
  filter((Subgroup =="Female"|Subgroup== "Male") & `Subject` =="Mathematics") %>% #Subset Datasets
  mutate(Fail = as.numeric(Fail)) #Fail to Numeric 

mean_values <- type_sch %>% # find averages for lines on histograms for each facet wrap
  group_by(Subgroup, Year) %>%
  summarize(mean_Fail = mean(Fail, na.rm = TRUE))
p <- ggplot(data = type_sch, mapping = aes(x = Fail)) #data from subseted type school
p + geom_histogram() +
  facet_wrap(~Subgroup~Year) + #Separate histogram by subgroup and year
  geom_vline(data = mean_values, aes(xintercept = mean_Fail, group = interaction(Subgroup, Year)),
             color = "blue")+ #Add in average line on the histogram
  theme_bw()+
    labs(x = "Percent of Students who Failed",
         y = "Number of Counties",
         title = "Percent of Students Failed by Gender")

Race Comparison

race_sch <- school_3 %>% 
  filter(Subject=="Mathematics" & (Subgroup== "Black" | Subgroup== "White" | Subgroup== "Asian" | Subgroup == "Hispanic"))  %>% #Subset by race
  mutate(Fail = as.numeric(Fail)) #Convert Character to numeric 

mean_fail_race <- race_sch %>% #Fined average for lines on histograms for each facet wrap
  group_by(Subgroup, Year) %>%
  summarize(mean_Fail = mean(Fail, na.rm = TRUE))
p <- ggplot(data = race_sch, mapping = aes(x = Fail)) #Data from subset
p + geom_histogram() +
  facet_wrap(~Year~Subgroup, nrow=3) + #align by year 
  geom_vline(data = mean_fail_race, aes(xintercept = mean_Fail, group = interaction(Subgroup, Year)),
             color = "blue")+ #add in histogram lines
  theme_bw()+
    labs(x = "Percent of Students who Failed",
         y = "Number of Counties",
         title = "Percent of Students Failed by Race")

Economic Comparison

econ_sch <- school_3 %>% 
  filter((Subgroup =="Economically Disadvantaged"|Subgroup== "Not Economically Disadvantaged") & `Subject` =="Mathematics") %>% #filter specific demographics 
  mutate(Fail = as.numeric(Fail)) #Character to numeric

mean_fail_ec <- econ_sch %>% #histogram averages
  group_by(Subgroup, Year) %>%
  summarize(mean_Fail = mean(Fail, na.rm = TRUE))
p <- ggplot(data = econ_sch, mapping = aes(x = Fail))
p + geom_histogram() +
  facet_wrap(~Subgroup~Year) +
  geom_vline(data = mean_fail_ec, aes(xintercept = mean_Fail, group = interaction(Subgroup, Year)),
             color = "blue")+ #add hisotgram lines
  theme_bw()+
    labs(x = "Percent of Students who Failed",
         y = "Number of Counties",
         title = "Percent of Students Failed by Economic Status")

Disability Status Comparison

dis_sch <- school_3 %>% 
  filter((Subgroup =="Students with Disabilities"|Subgroup== "Students without Disabilities") & `Subject` =="Mathematics") %>% #Subset
  mutate(Fail = as.numeric(Fail)) #To numeric 

mean_fail_dis <- dis_sch %>% #Create averages for histogram
  group_by(Subgroup, Year) %>%
  summarize(mean_Fail = mean(Fail, na.rm = TRUE))
p <- ggplot(data = dis_sch, mapping = aes(x = Fail))
p + geom_histogram() +
  facet_wrap(~Subgroup~Year) + 
  geom_vline(data = mean_fail_dis, aes(xintercept = mean_Fail, group = interaction(Subgroup, Year)),
             color = "blue")+ #add in average lines
  theme_bw()+
    labs(x = "Percent of Students who Failed",
         y = "Number of Counties",
         title = "Percent of Students Failed by Disability Status")

Home Status Comparison

home_sch <- school_3 %>% 
  filter((Subgroup =="Homeless"|Subgroup== "Foster Care"|Subgroup== "Migrant") & `Subject` =="Mathematics") %>% #Subset
  mutate(Fail = as.numeric(Fail)) #Character as numeric 

mean_fail_home <- home_sch %>% #add averages
  group_by(Subgroup, Year) %>%
  summarize(mean_Fail = mean(Fail, na.rm = TRUE))
p <- ggplot(data = home_sch, mapping = aes(x = Fail))
p + geom_histogram() +
  facet_wrap(~Subgroup~Year) +
  geom_vline(data = mean_fail_home, aes(xintercept = mean_Fail, group = interaction(Subgroup, Year)),
             color = "blue")+ #add average line
  theme_bw()+
    labs(x = "Percent of Students who Failed",
         y = "Number of Counties",
         title = "Percent of Students Failed by Home Status")